Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch harakiri graceful signal in middlware and log debug info #15447

Merged
merged 1 commit into from
Aug 29, 2024

Conversation

kdelee
Copy link
Member

@kdelee kdelee commented Aug 16, 2024

SUMMARY

Requests that exceed a certain timeout can be killed by uwsgi
But it begs the question -- what were they doing?!

By configuring uwsgi harakiri to send a SIGABRT (signal 6) as a first "graceful" signal, we can catch it in middleware and log a message with the stack where code was executing (probably stuck) and more information about the request

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME
  • API
AWX VERSION
devel
ADDITIONAL INFORMATION

inspired by getsentry/sentry-python#1783

May require update to uwisgi.ini in operator/any other installer that lays down uwsgi config if doesn't just use tools/ansible/roles/dockerfile/files/uwsgi.ini from here

Sample traceback:
(this was contrived by putting "time.sleep(300)" in the root view)

tools_awx_1       | Fri Aug 16 15:30:15 2024 - *** HARAKIRI ON WORKER 1 (pid: 13747, try: 1, graceful: yes) ***                                                                                                                                                                                                             
tools_awx_1       | Fri Aug 16 15:30:15 2024 - HARAKIRI !!! worker 1 status !!!                                                                               
tools_awx_1       | Fri Aug 16 15:30:15 2024 - HARAKIRI [core 0] 172.23.0.1 - GET /api/ since 1723822210                                                      
tools_awx_1       | Fri Aug 16 15:30:15 2024 - HARAKIRI !!! end of worker 1 status !!!                                                                        
tools_awx_1       | Fri Aug 16 15:30:15 2024 - HARAKIRI: graceful termination attempt on worker 1 with signal 31. Next harakiri: 1723822217                                                                                                                                                                                 
tools_awx_1       | Fri Aug 16 15:30:15 2024 - HARAKIRI triggered by worker 1 core 0 !!!                                                                      
tools_awx_1       | 2024-08-16 15:30:15,942 ERROR    [60200fba] awx.main.middleware Catching harakiri graceful signal for b3762186-1014-4608-a1f5-225d9f1bcd4a with method: GET path: /api/                                                                                                                                 
tools_awx_1       | 2024-08-16 15:30:15,948 ERROR    [60200fba] awx.main.middleware Received harakiri graceful signal while in stack:   File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/rest_framework/views.py", line 506, in dispatch                                                                          
tools_awx_1       |     response = handler(request, *args, **kwargs)                                                                                          
tools_awx_1       |   File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/decorators.py", line 46, in _wrapper                                                                                                                                                                                          
tools_awx_1       |     return bound_method(*args, **kwargs)                                                                                                  
tools_awx_1       |   File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/decorators.py", line 134, in _wrapper_view                                                                                                                                                                                    
tools_awx_1       |     response = view_func(request, *args, **kwargs)                                                                                        
tools_awx_1       |   File "/awx_devel/awx/api/views/root.py", line 53, in get                                                                                
tools_awx_1       |     time.sleep(300)                                        
tools_awx_1       |   File "/awx_devel/awx/main/harakiri_middleware.py", line 15, in handle_signal                                                            
tools_awx_1       |     logger.error(f"Received harakiri graceful signal while in stack: {''.join(traceback.format_stack()[-5:])}")                 

@kdelee
Copy link
Member Author

kdelee commented Aug 19, 2024

Requires ansible/awx-operator#1943

@AlanCoding
Copy link
Member

2024-08-16 15:30:15,948 ERROR [60200fba] awx.main.middleware Received harakiri graceful signal while in stack: File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/rest_framework/views.py", line 506, in dispatch

Can you explain to me what's happening with the [-5:]? I get it in broad terms but can't connect it to the specific items I see in that line. It also has some fragility, that maybe isn't needed.

@kdelee
Copy link
Member Author

kdelee commented Aug 21, 2024

2024-08-16 15:30:15,948 ERROR [60200fba] awx.main.middleware Received harakiri graceful signal while in stack: File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/rest_framework/views.py", line 506, in dispatch

Can you explain to me what's happening with the [-5:]? I get it in broad terms but can't connect it to the specific items I see in that line. It also has some fragility, that maybe isn't needed.

this is just to limit depth of stack. When it passed w/o that was just a mile high and may be distracting when really you just want basically last few frames. We could just get the whole thing and not worry about any potential index error.

@kdelee
Copy link
Member Author

kdelee commented Aug 21, 2024

new example with move to django-ansible-base

2024-08-21 18:20:28,999 ERROR    [11ec4fb9] ansible_base.lib.middleware.logging.log_request Catching graceful timeout signal for GET path: /api/
2024-08-21 18:20:29,004 ERROR    [11ec4fb9] ansible_base.lib.middleware.logging.log_request Received graceful timeout signal while in stack:   File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/wsgi.py", line 124, in __call__
    response = self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/base.py", line 140, in get_response
    response = self._middleware_chain(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/debug_toolbar/middleware.py", line 48, in __call__
    return self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django_guid/middleware.py", line 75, in middleware
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/awx_devel/django-ansible-base/ansible_base/lib/middleware/logging/log_request.py", line 26, in __call__
    return self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
    response = response or self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
    response = response or self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
    response = response or self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
    response = response or self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/corsheaders/middleware.py", line 56, in __call__
    result = self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
    response = response or self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
    response = response or self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
    response = response or self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
    response = response or self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
    response = response or self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
    response = response or self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
    response = response or self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/social_django/middleware.py", line 28, in __call__
    return self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/crum/__init__.py", line 97, in __call__
    response = self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
    response = response or self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
    response = response or self.get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib64/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/views/decorators/csrf.py", line 56, in wrapper_view
    return view_func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/views/generic/base.py", line 104, in view
    return self.dispatch(request, *args, **kwargs)
  File "/awx_devel/awx/api/generics.py", line 351, in dispatch
    return super(APIView, self).dispatch(request, *args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/decorators.py", line 46, in _wrapper
    return bound_method(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/django/utils/decorators.py", line 134, in _wrapper_view
    response = view_func(request, *args, **kwargs)
  File "/awx_devel/awx/api/views/root.py", line 49, in get
    import time; time.sleep(100)
  File "/awx_devel/django-ansible-base/ansible_base/lib/middleware/logging/log_request.py", line 17, in handle_signal
    logger.error(f"Received graceful timeout signal while in stack: {''.join(traceback.format_stack())}")

this is with contrived example of import time; time.sleep(100) put into the root API view

@kdelee kdelee force-pushed the harakiri_middleware branch 2 times, most recently from 67e031d to 7eb3de6 Compare August 22, 2024 16:36
@kdelee kdelee force-pushed the harakiri_middleware branch from 7eb3de6 to 34b686d Compare August 27, 2024 17:05
@kdelee
Copy link
Member Author

kdelee commented Aug 27, 2024

ansible/django-ansible-base#581 merged...hoping this will help these PR checks pass

@kdelee
Copy link
Member Author

kdelee commented Aug 27, 2024

also need ansible/awx-operator#1943

@kdelee kdelee force-pushed the harakiri_middleware branch from 34b686d to f793bda Compare August 28, 2024 14:14
Copy link

@kdelee kdelee merged commit 444af2b into devel Aug 29, 2024
19 checks passed
@kdelee kdelee deleted the harakiri_middleware branch August 29, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants